在许多计算机视觉任务(包括图像识别和对象检测)中,成功地使用了变压器结构成功使用的自我发挥机制。尽管激增,但使用变压器来立体声匹配问题仍然相对尚未探索。在本文中,我们全面研究了变压器在立体声匹配的问题上的使用,尤其是对于腹腔镜视频,并提出了一个新的混合型直立立体声匹配框架(Hybridstereonet),将CNN的最佳和变压器结合在统一的设计中。具体而言,我们研究了几种方法,通过分析设计的损失格局和内域/跨域准确性,将变压器引入体积立体声匹配管道。我们的分析表明,在使用CNN进行成本聚合的同时,使用变压器进行功能表示学习,将导致比其他选项更快地收敛,更高的准确性和更好的概括。我们在SceneFlow上进行的广泛实验,Scread2019和DVPN数据集证明了Hybridstereonet的出色性能。
translated by 谷歌翻译
表格数据是业务应用程序中最常见的数据存储格式之一,范围从零售,银行和电子商务。这些应用在很大程度上依赖机器学习模型来取得业务成功。学习表格数据的关键问题之一是将有影响力的特征与所有预定特征区分开。假设所有实例都具有相同的影响力子集,那么全球功能选择已经进行了很长时间。但是,不同的实例依赖于实践中的不同特征子集,这也引起了实例的特征选择,在最近的研究中受到了越来越多的关注。在本文中,我们首先提出了一种新的方法,以发现表格数据的实例影响特征(DIWIFT),其核心是引入影响函数以衡量实例特征的重要性。 Diwift能够在不同实例中自动发现不同尺寸的影响力子集,这与全局特征选择不同,后者考虑了具有相同影响力特征子集的所有实例。另一方面,与以前的实例功能选择不同,DIWIFT最大程度地减少了验证集的验证损失,因此对于训练数据集和测试数据集中存在的分配变化更为强大,这在表格数据中很重要。最后,我们对合成数据集和现实数据集进行了广泛的实验,以验证我们的diwift的有效性,并将其与基线方法进行了比较。此外,我们还通过一些消融实验来证明我们方法的鲁棒性。
translated by 谷歌翻译
Diagnosis-oriented dialogue system queries the patient's health condition and makes predictions about possible diseases through continuous interaction with the patient. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods cannot achieve sufficiently good prediction accuracy, still far from its upper limit. To address the problem, we propose a decoupled automatic diagnostic framework DxFormer, which divides the diagnosis process into two steps: symptom inquiry and disease diagnosis, where the transition from symptom inquiry to disease diagnosis is explicitly determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model respectively. We use the inverted version of Transformer, i.e., the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross entropy loss. Extensive experiments on three public real-world datasets prove that our proposed model can effectively learn doctors' clinical experience and achieve the state-of-the-art results in terms of symptom recall and diagnostic accuracy.
translated by 谷歌翻译
In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level finegrained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies. Both code and data is available from https://github.com/lemuria-wchen/imcs21.
translated by 谷歌翻译
人体对象交互(HOI)检测是高级人以人为中心的场景理解的基本任务。我们提出了短语,其中包含了Hoi分支和一个新型短语分支,以利用语言和改进关系表达。具体而言,短语分支由语义嵌入式监督,其基础事实自动从原始的Hoi注释自动转换,而无需额外的人力努力。同时,提出了一种新颖的标签组合方法来处理会安的长尾问题,由语义邻居复合新型短语标签。此外,为了优化短语分支,提出了由蒸馏损失和平衡三态损耗组成的损失。进行了广泛的实验,以证明拟议的短语疗养的有效性,这使得对基线的显着改善,并超越了以前的最先进的方法,以满足的HICO-DET基准。
translated by 谷歌翻译
病变检测是乳房X线照相术的计算机辅助诊断方案中的一个基本问题。如果培训数据在图像风格和质量方面,深度学习技术的进步对这项任务产生了显着的进展。特别地,图像样式的多样性可能主要归因于供应商因子。然而,尽可能多的供应商收集来自供应商的非常昂贵,并且有时对于实验室规模研究是不切实际的。因此,为了进一步将深度学习模型的泛化能力扩展到具有有限资源有限的各种供应商,开发了一种新的对比学习方案。具体地,骨干网络首先具有多种式和多视图无监督的自学习方案,用于将不变功能嵌入到各种供应商样式中。之后,用特定的监督学习重新校准骨干网络与病变检测的下游任务。所提出的方法是用来自四个供应商的乳房X线照片和一个看不见的公共数据集进行评估。实验结果表明,我们的方法可以有效地改善观察和看不见的域的检测性能,并且优于许多最先进的(SOTA)泛化方法。
translated by 谷歌翻译
因果推断是在采用干预时估计因果关系中的因果效应。确切地说,在具有二进制干预措施的因果模型中,即控制和治疗,因果效应仅仅是事实和反事实之间的差异。困难是必须估算反事实,因此因果效应只能是估计。估计反事实的主要挑战是确定影响结果和治疗的混杂因素。一种典型的方法是将因果推论作为监督学习问题,因此可以预测反事实。包括线性回归和深度学习模型,最近的机器学习方法已适应因果推断。在本文中,我们提出了一种通过使用变分信息瓶颈(CEVIB)来估计因果效应的方法。有希望的点是,VIB能够自然地将变量从数据中蒸馏出来,从而可以通过使用观察数据来估计因果效应。我们通过将CEVIB应用于三个数据集,表明我们的方法实现了最佳性能,将其应用于其他方法。我们还实验表明了我们方法的鲁棒性。
translated by 谷歌翻译
心脏结构的准确分割可以帮助医生诊断疾病并改善治疗计划,这在临床实践中是高度要求的。但是,不同供应商和医疗中心之间的注释短缺以及数据的差异限制了先进的深度学习方法的性能。在这项工作中,我们提出了一种全自动方法,用于分割包括左(LV)和右心室(RV)血池在内的心脏结构,以及MRI体积中的左心室心肌(Myo)。具体而言,我们设计了一种半监督的学习方法,以通过标签传播来利用未标记的MRI序列时间范围。然后,我们利用样式转移,以减少不同中心和供应商之间的方差,以进行更健壮的心脏图像分割。我们在M&M挑战7中评估我们的方法,在14个竞争团队中排名第二。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译